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ABSTRACT 

In this paper, the method of Total least squares is presented for the set of statistical data points which are under 
interval uncertainty and, Hansen-Bliek-Rohn method, was applied to solve the resulting interval linear system 
with guaranteed inclusion bounds as demonstrated by numerical example. It is suggested that Hansen-Bliek- 
Rohn method always provide results which takes into account all round off errors which are as good as worst 
case error bounds with less computational efforts where theoretical floating point result fails. 
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I. INTRODUCTION 

The paper presents interval methods on Least squares equation for interval data type problems. When experimental 
problems are expressed in terms of uncertainties, the need to provide bounds for the solution set can be achieved through the 
use of interval arithmetic. Practically, it is that in a situation where safety is of paramount importance, interval arithmetic is 
able to provide the worst case guarantee [1] where traditional floating point method fails. However interval arithmetic can 
be weak due to dependency problem which can be traced to a result of locality reasoning inheritable from arc consistency [2] 
and [3]. Interestingly this has been overcome in recent time [1]. 

Applications of Total least squares to problems can be found in signal processing, automatic control, system 
theory, various engineering practices, statistics, physics, economics, biology and medicine and a host of others. 

Statistical approach has long been in use for describing how observed set of data points 

(T ,Y ), (j 2 , Y 2 ), \T , Y ) of sample size n of a population fits the data but when the data points are expressed in the 

form of uncertainty any computation using traditional approach will undermine the quality of results. Thus interval 

arithmetic becomes a useful tool in this case. There exist five parameters p T , p y ,S T ,S y and p for describing the 

behaviour of the statistical data. One of such ways [4] begins with elementary statistical analysis for traditional floating point 
approach such that the estimate for p could be defined by the use of product moment 

p = e(t - p y ){Y - p y ) (1.1) 

Z T , 

The parameters p T , p y can be estimated by T and Y where T denotes — as the average of variables T . The 

n 

1 " 

product moment is then written as — ^ 



Assuming 8 T and S y are approximated by S T and S y and denoting the sample estimate p as r , then the correlation 
coefficient is written in the form: 
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The z value for the normal distribution based on the knowledge of value of p is given in the form 

1 1 + r 

z = -log , (1.3) 

2 1 - r 

and this gives the random variable z to be distributed with mean 

_ 1 , 1 + p , and standard deviation, x 1 . 

M , — io g e u - — j 

2 1 - p V«-3 
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We move to discuss the computations of mean and variance for the statistical data when such data are 
subjected to interval uncertainty. As in [5], we describe method for computing mean and variance with interval 
data as follows: 

1 

First, we note that, /u = — ^ T. is a monotonically increasing function of each n variables ^ , T 2 T and, 

n i 

the value for the interval mean was computed in the 



if \ - 1 f - 



ii 



form: // = — \T + T + ... +T , n 



T + T 2+ ... + T „ 



Unfortunately, the same cannot be said of variance as it is dependent on T. , its monotonicity fails in this 

circumstance as a result of dependency problem caused by locality reasoning inheritable from arc consistency . 
Besides, its computational complexity grows exponentially as the size n of data increases. It follows that the true 

estimate of the statistic V (T , T 2 T ) on the interval! IT, . ],...[ T n ,T „]} , will be best described as 

1 

below. First recalling that midpoint for interval T. = [T. , T. ] is written as7\ = — (T.+ T .) , and its 

_ ' 2 ' l 

radius A . = (T. - T. ) / 2 . Following [6], the range of V could be written as [V - A , V + A ] , where 

" ■ ■ dV 2 f 



V. 

dt. 



ju , and ju is the average midpoints of 



1 



" f 



= 1 v 



T. c - n | is the average variance of the midpoint values T lc ,..., T nc . 

A • The quadratic term for the variance 



2 

The radius of the variance A was defined as a = — V 



) 



which was negligibly small in the truncated Taylor series was given as — ((AT.)" - (A/u) 2 ), where, 

« (=i 

1 " 

< A/i = AT . Thus the inclusion interval for variance V was computed 

asV = [V - A,V + A + A (2) ] . 

The remaining section in this paper is structured as follows. Section 2 gives the introduction of the problem of 
Least squares as it relates Co- variance matrix and the distribution of variance as a biased estimator. In section 3, the extent to 
which an interval matrix that is not strongly regular can become singular if an attempt is made to extend the radius of an 
interval matrix by a certain factor based on the idea given in [7] was the main point of focus. Section 4 gives the numerical 
method we used to solve the resulting interval linear system obtained from the total least squares when the variable in the 
equation were subjected to uncertainty either due to contamination of the measuring instruments used in the model or some 
noise coming from the inexact observation of the model. We used the Hansen -Bliek-Rohn method which can be found in 
[8,9 and 10] to solve the Linear interval system of equations. Section 5 gives numerical example of the presented methods. 
Section 6 gives conclusion of the paper based on our findings. 

II. GENERAL LINEAR CASE 

Consider the general linear model 

y t = AT n + J3 2 T 12 + ... + 0T ip 2.1 

Where T is the j-th observation on the j-th independent variable and the first variable T n 
takes value 1. The residual is given by 



e. = Y t - (/?, T u + p 2 T n + ... + p p T ip ) (2.2) 
The normal equation is obtained in a general way by minimizing the sum of least squares error: 
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; = 1 k=l i = l 

Where T is m x n matrix, Y is an m x 1 vector, /? is n x 1 vector. 
In matrix form, we have a representation 

(T'T) P = t'y (2.4) 

The condition number for the rectangular matrix T with full column rank is discussed in terms of 2-norm and is 
in the sense of [1 1], given by 

S (T) 

T e R , rank (T ) = n => k 2 (T ) = — — , where 8 (T ) denotes the singular value of T. 

S (T) 

ran v ' 

Considering that matrix T T e A for real matrix A e R we discuss the statistical interpretation of 
equation 2.4 as follows: 

£(/?) = £|r'r['r'y}= (t't~ 1 )t'tj3 = p . (2.5) 

The sampling variance var( v ) = 8 2 1 

It can be showed that 8 is a biased estimator of 8 ' , and £ 



5 



n — m , 
J". (2.6) 



Definition 2. 1 : Assuming that p = A 1 r ' , 

<J = — IF - T p I f F - T p I = — (f ' (/ - TA ~ l T ' )y ), denoting G = (i - TA ' 7 ' ) , the 

n — m y J \ J n — in 

r. a 2 

parameters p and 8 are said to be independent if (a 1 r )g = . 



„ 2 



F-TyffF-Tjff 



As is well known, distribution of 8 is discussed as follows. Since, 8 = , then it 

n — m 

follows that (n - m )8 = F - T p F - T p = F ' (/ - TA -1 J ' )f . But (/ - TA ~ l T ' ) is 



is an 



idempotent matrix as a result, it holds that (n - m )S = Y ' (i - TA ~'r ' )y = Y GY which gives 

" 2 y'gy 



{n-m)S 2 , p'T'GTp 

That is, Z n-m a w ' m non " centrality parameter A = . As X -> , this gives Chi- 

8 2 ' 28 2 

square distribution. 

III. COMPONENT WISE DISTANCE AS A CONDITION NUMBER TO THE WEIGHT 

MATRIX E 

Considering the fact that we are interested in a situation when the coefficients of T T e A are interval entries 
where, A e IR "*" and a non negative E e R "*" .It should be noted that for interval matrix 
A = [A - E, A + E] it is that: 



A c = —I A+ A 



E = — | A - A 
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Given knowledge of 8 ( A , E ) as being defined as a number a = S (A, E) for which the matrix 
[A - a .E , A + a .E ] is singular, and settings = p(\A _1 . E) we define any matrix as 

[A - w 1 . E , A + co' .£],( where co' < co ) is strongly regular. In [12] it was proved that the ratio 

8(A, E) 



p{\A- \.E) 



is the distance between strong regularity and regularity of the matrix A. Thus, Bauer-Skeel 



condition number written as Cond (A, E) 



A 



for some norm . relates component wise ■ 



distance as a condition number to the weight matrix E. A bound where such a matrix A could become singular 
1 

occurs, assuming 



P 



(k'U) 



< $((A, e) , when: 



, A" 1 .e) 



< S{(A,E)< (3 + 2V2). n . 



. A" 1 . E ) 



(3.1) 



It follows that equation 3. 1 states that, there exists a bound n < rj (n) < (3+2 V2 ). n for which an interval matrix that is 

not strongly regular will produce a singular matrix if an attempt is made to increase radius by at most ( 3+ 2V2"). n . As 
advised in [7], the factor 



(3 + 2 V2~).n. 



, cannot be replaced by the term (3 + 2 ~Jl\ 



p\a~ 1 \e) ' ' ' p\a-'\e) 

Assuming equation 2.4 is in normal form which can be written in the linear interval 

system A/? = b = {/? : A e A , b e b } , it is assumed that t't e A , and T 'y e b then we aim to solve for 
the unknown interval p e IR " in order to obtain the Hull of the solution set 

£ (A, b)=( P e IR"\a e A3b eb 3 A /? =bj . (3.2) 

Several characterizations of solution sets to (3.2) exist, for example, [13], gave three types of such solution sets to include 
among others as follows: 
a tolerable solution set 



^ (A,*)= \p e IR 

V3 [ 

The controllable solution set 



(VA e A)(3fc e b) A J3 = b U , (3.3) 



^ (A,b)= \p s IR " 

V3 [ 

and the united solution set 



(VZ> e b)(3A e A ) 



A P 



{3 A s A )(lb e b) 



X (A,fc)= X = ^ E /i?" 

33 33 [ 

The terms V , 3 appearing above are all quantifiers. 



(3.4) 



A p - b | j . (3.5) 



IV. THE INCLUSION METHODS FOR EQUATION 2.4 

The inclusion method under consideration for the solution of equation 2.4 will be the adoption of Hansen-Bliek- 
Rohn [10] and, where united solution set is used in our consideration. 

The enclosure for the inverse interval matrix A = [A c — E , A c + E] , provided that p MA c 1 \e ) < 1 in the sense of [9 
and 10] may be written as: 



A ~ 
Where 



[min [b,T v b\ max js,7\, fij] (4.1) 
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M = (l - U~'U) , 

M = (m = 1,2,..., n, 

T r = (27, -/)"'. 

b = -m U" 1 + r /( (a; 1 + a~ ), 

B = M U" 1 + T /i (a^ 1 - A" 1 ) 

In equation 4.1, r is an orthant which is very crucial for the existence of Hansen-Bliek-Rohn method, see e.g.. 
[8] for an overview, provided p \iA _1 .2? ) < 1 . 

The error bound for the computed approximate solution p to the true solution /3 provided by using Rohn's 
method 4.1, as described in [10] is distributed as follows. 

p - p " = [t 't ) ' t 'rp -(t't) ' t'y 

= (t't )' (t 'TP - T'y ) (4.2) 
By taking norm of both sides of equation 3.2 in the sense of [14 ], gives that 

\p - p *|| 2 < (t'tY' .It'(tp - r) < « r'^yff - y)A (4.3) 

Using right hand side of equation 4.2 we have 

(r'r) mt 1 {r p - y ) = (r'r)" 1 It 1 (rp - rp " + rp " - y ) || 

(t't) ' .|r V(yS - p')\\ 

» K (T) 2 .jp - J3*j 

The error bound ||/? - p *|| is directly proportional to K (T ) , this is more so, since, a T ' (Tp - Y ) is 
the power 3 of the condition number K(T) of the matrix . 

V. IMPLEMENTATION AND EXPERIMENTATION 

Problem 1. 

Consider the given problem taking from [15] of regressing y on t 
"Y = p () In T + p , cos T + P 2 e T " . The equation is 

<p{P a ,P,,P 2 ) = ^ (P <p(ln T .)+ /?,p(co S T .)+ P 2 <p(e T ' )- Y^ 

The data points are 



(5.1) 



T 


Y 


0.24 


0.23 


0.65 


-0.26 


0.95 


-1.10 


1.24 


-0.45 


1.73 


0.27 


2.01 


0.10 


2.23 


-0.29 


2.52 


0.24 


2.77 


0.56 


2.99 


1.00 



Take data noise to be s = 1 % for each data set (T Y . ) . Result obtained for normal equations without 
noise on the data using MATLAB 2007 Version is (- 1 .04103 ,- 1 .26132 ,0 .03073 f 
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Result obtained using Hansen-Bliek- Rohn's method (3.1) for (P , /?, , P 2 ) T where the data under 

investigation were subjected to some noise of s = 1 % was found to be 

([-1.0312 , -1.0085 ], [-1.2757 ,-1.1951 ], [0.0305 ,0.0309 ] f . 
Therefore the regression equation is 

"F(D = [-1.0312 ,-1.0085 ](p(h T) + [-1.2757 ,-1.1951 ]<p(cos 7/) + [0.0305 ,0.0309 ]<p(e 7 ')", 
Let us note that the solution set given by using Hansen-Bliek-Rohn's method 3. 1 is closed in the sense of Oettli-Prager 

T — 1 / 

theorem [13]. It should also be noted that the mapping (A, b) — » (A A) A b is also compact and continuous in the 
Least squares sense. 

By taking the value of t to be in the interval of [3.0, 3.5] we have the following result in midpoint-radius interval, the value 
for equation 4.2 in midpoint-radius interval was computed to be 

•■Y[T]=< -1.508421816 ,0.444770179 > ". 

In the same reasoning, the value of <p(fi , p , p )= -1.693769447 for the midpoint interval 
[3.0, 3.5] was computed. 

VI. CONCLUSION 

The paper considered Total Least squares method for statistical data set which is under interval uncertainty. 
Hansen-Bliek-Rohn's method was applied on the Least squares method which provides tight inclusion bounds for the Least 
squares problem with less computational efforts. It was demonstrated in our example that computed result using Hansen- 
Bliek-Rohn method showed a close relationship with guaranteed worst case error bound which takes into account all round 
off errors as compared with theoretical floating point result. We also examined regularity condition of resulting interval 
matrix coming from system of equation 2.4 taking into consideration distance between strong regularity and regularity of 
the interval matrix A in a sense analogous to [7] where Bauer-Skeel condition number played a very crucial role which 
relates component wise -distance as a condition number to the weight matrix E. 
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